Sequence-based protein-protein interaction prediction via support vector machine
نویسندگان
چکیده
This paper develops sequence-based methods for identifying novel protein-protein interactions (PPIs) by means of support vector machines (SVMs). The authors encode proteins ont only in the gene level but also in the amino acid level, and design a procedure to select negative training set for dealing with the training dataset imbalance problem, i.e., the number of interacting protein pairs is scarce relative to large scale non-interacting protein pairs. The proposed methods are validated on PPIs data of Plasmodium falciparum and Escherichia coli, and yields the predictive accuracy of 93.8% and 95.3%, respectively. The functional annotation analysis and database search indicate that our novel predictions are worthy of future experimental validation. The new methods will be useful supplementary tools for the future proteomics studies.
منابع مشابه
Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملProtein function prediction via graph kernels
MOTIVATION Computational approaches to protein function prediction infer protein function by finding proteins with similar sequence, structure, surface clefts, chemical properties, amino acid motifs, interaction partners or phylogenetic profiles. We present a new approach that combines sequential, structural and chemical information into one graph model of proteins. We predict functional class ...
متن کاملIdentification of Surface Residues Involved in Protein-Protein Interaction – A Support Vector Machine Approach
We describe a machine learning approach for sequence-based prediction of protein-protein interaction sites. A support vector machine (SVM) classifier was trained to predict whether or not a surface residue is an interface residue (i.e., is located in the protein-protein interaction surface) based on the identity of the target residue and its 10 sequence neighbors. Separate classifiers were trai...
متن کاملAnalyses for protein tertiary structure prediction by Mika Takata ( Under the Direction of
Protein fold classification is essential to recognition of protein tertiary structure. It is of particular interest to the structure analyses of proteins of low sequence identity with respect to proteins of known structures. We investigated the protein fold recognition problem with the Committee Support Vector Machine (CSVM) that proved efficient and effective in feature parameterization of bac...
متن کاملPrediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks
Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...
متن کاملDomain Linker Region Knowledge Contributes to Protein-protein Interaction Prediction
Protein-protein interaction has proven to be a valuable piece of biological knowledge and a starting point for understanding the internal workings of the cell. In this paper, we propose a novel method for protein-protein interaction prediction using only the primary structural information of the protein sequence. The method is developed based on inter-domain linker region knowledge and a combin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Systems Science & Complexity
دوره 23 شماره
صفحات -
تاریخ انتشار 2010